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BHALCHANDRA D. THATTE 

Abstract. A pedigree is a directed graph in which each vertex 
(except the founder vertices) has two parents. The main result 
in this paper is a construction of an infinite family of counter ex- 
amples to a reconstruction problem on pedigrees, thus negatively 
answering a question of Steel and Hein. Some positive reconstruc- 
tion results are also presented. The problem of counting distinct 
(mutually non-isomorphic) pedigrees is considered. The known 
lower and upper bounds on the number of pedigrees are improved 
upon, and their relevance to pedigree reconstruction from DNA 
sequence data is discussed. It is shown that the information the- 
oretic bound on the number of segregating sites in the sequence 
data that is minimally essential for reconstructing pedigrees would 
not significantly change with improved enumerative estimates. 



1. Introduction 

A general pedigree T{Xq) on a set Xq, is a finite directed graph on a 
vertex set V that satisfies the following conditions: 

(1) each vertex has out-degree or 2; 

(2) Xq is a subset of V, and each vertex in Xq has in-degree 0; 

(3) there are no isolated vertices. 

The vertices with out-degree are called the founders. The vertices 
in Xq are called extant. The cardinality of Xq is called the order of the 
pedigree. Note that Xq is a subset of the set of vertices with in-degree 
0. 

A discrete generation pedigree on Xq is a pedigree on vertex set V = 
uf^QXi, where Xi are disjoint sets, Xd is the set of founders, and every 
vertex u in Xi;i < d has outgoing arcs uv and uw to vertices v and w, 
respectively, in Xj+i. In this case, d is the depth of the pedigree. 

If there is an arc from a vertex m to a vertex v, then v is called a 
parent of u, and u is called a child of v. If there is a directed path from 
a vertex m to a vertex f in a pedigree, then v is said to be an ancestor 
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of M, and u is said to be a descendent of v. Trivially, each vertex is its 
own ancestor as well as its own descendent, but not its own parent or 
child. If there is a directed path u — Ui — . . .Uk then Uk is called a fc-th 
grandparent of m, and u is called a k-th. grandchild of Uk- 

A pedigree V{Xq) with vertex set U is said to be isomorphic to a 
pedigree Q{Yq) with vertex set V if there is a one-one map f : U ^ V 
such that Ml — ^2 is an arc in V{Xq) if and only if f{ui) — /('U2) 
is an arc in Q(Yo). Although this is a standard definition of graph 
isomorphism, we will be interested in pedigrees in which the extant 
vertices are labelled. Therefore, if Xq = Yq = {x,; 1 < i < n} then we 
will be interested only in isomorphisms vr for which 7r(xj) = for all 
1 < i < n. 

A motivation to study pedigrees comes from biology, where one is 
interested in reconstructing pedigrees of populations. But it is hoped 
that the main result in this paper - the non-reconstructibility of pedi- 
grees from sub-pedigrees - will also be of interst to combinatorialists 
interested in the well known reconstruction conjectures. 

Steel and Hein 3j posed and partially solved reconstruction and enu- 
meration questions about pedigrees. Motivated by results in phyloge- 
netics, a natural question to ask is: is a pedigree determined up to 
isomorphism from the pairwise distances between extant vertices? A 
pair of extant vertices x and y in a pedigree may have several common 
ancestors, therefore, it is assumed that all possible distances (in the 
undirected sense) between all pairs of extant vertices are given. Such 
a question is not expected to have a positive answer, as demonstrated 
by a counter example in j3j. Despite the counter example, variations 
of this question are definitely significant in evolutionary biology. Steel 
and Hein considered the following weaker question. 

Let V{Xo) be a pedigree. A sub-pedigree V(Y) of V{Xq) is obtained 
by deleting every vertex in V{Xo) that has no descendent in Y. Now if 
sub-pedigrees on all two-element subsets of Xq are given up to isomor- 
phism, can we construct the sub-pedigree on Xq up to isomorphism? 
Steel and Hein presented a counter example in their paper. They posed 
the following problem. 

Problem 1. Is there an integer r > 2 such that every pedigree P(Xo) 
of order n > r determined up to isomorphism if all its sub-pedigrees 
V{Y) such that |y| = r are given up to isomorphism? 

Combinatorialists familiar with the reconstruction conjectures might 
be tempted to dismiss this question, therefore, it must be pointed out 
that the set Xq in a pedigree is labelled. In other words, "a sub-pedigree 
V{Y) given up to isomorphism" is to be interpreted as a pedigree in 
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which all vertices except the ones in Y are unlabelled. The following 
definitions are introduced to make this remark more formal. 

Definition 1. Let n > r > 2 he positive integers. Let T{Xq) and 
U{Xq) be two pedigrees of order n. The two pedigrees are said to be 
r-hypomorphic to each other if for every Y C Xq; \ Y\ = r, there is 
an isomorphism vry from the sub-pedigree T{Y) of T{Xo) to the sub- 
pedigree U (Y) of U (Xq) such that vry (x) = x for all x E Xq. A pedigree 
T{Xq) is said to be r-reconstructible if for every pedigree U{Xq) that 
is r-hypomorphic to T{Xq), there is an isomorphism vr from T{Xq) to 
U{Xo) such that tt{x) = x for all x G Xq. 

Problem 2. Is there an integer r > 2 such that all pedigrees of order 
n > r are r-reconstructible? 

In Section 121 we present a family of counter examples as well as a few 
positive results on constant population size pedigrees. We prove that 
for every n > 3, there are pedigrees of order n that are not even {n — 1)- 
reconstructible. The problem of classification of non-reconstructible 
pedigrees remains open, and we suspect that it might have an algebraic 
structure similar to the Nash- Williams' lemma in edge reconstruction 
theory, see 0. 

Steel and Hein considered the question of enumerating mutually non- 
isomorphic pedigrees of a fixed depth. A lower bound on the number 
of distinct pedigrees implies, by an information theoretic argument, a 
lower bound on the number of segregating DNA sites that would be 
necessary in order to reconstruct the pedigree of a population from 
the sequence data. In Section 01 we prove tighter lower and upper 
bounds, and show that the information theoretic lower bound does not 
increase much. Steel and Hein leave the problem of enumerating general 
pedigrees open. Here we enumerate general pedigrees as well, and again 
show that purely information theoretic arguments as in their paper are 
not sufficient to show that general pedigrees would necessarily require 
significantly more segregating sites for their reconstruction from the 
sequence data. 

2. Reconstruction of pedigrees. 

2.L A negative result. We solve Problem ^ negatively by construct- 
ing an infinite family of pairs of non-isomorphic pedigrees that have 
correspondingly isomorphic sub-pedigrees. That is, we prove the fol- 
lowing 

Theorem 1. For every n > 2, there are non-isomorphic pedigrees 
T{Xq) andU^Xo) of order n that are (n — l)-hypomorphic. 
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Proof. The proof is divided in two cases. The case n = 3 gives the 
basic idea, which is then generahsed to arbitrary values of n. 
Case n = 3. 

Consider the non-isomorphic graphs Ki ^ and K^. Let the edges 
of both graphs be arbitrarily labelled 61,62,63, where, following the 
standard graph theoretic convention, an edge is a set of two vertices. 
It is clear that i^i^s — 6j = — 6j for all i, where — 6^ denotes deletion 
of the edge 6j and the resulting isolated vertices. Now suppose that the 
end vertices of each edge 6j are parents of the vertex Xi G Xq in each 
of the pedigrees T{Xq) and U{Xq). Then the pedigrees T{{xi,Xj}) 
and U{{xi,Xj}) are isomorphic for all but the pedigrees T^Xq) 
and U{Xo) are not isomorphic. This example proves the theorem for 
n = 3. The pedigrees T(Xo), U{Xo), and their sub-pedigrees are shown 
in Figure H 



^ 62 

^1 - _ 62 63 




Xi X2 X3 Xi X2 X3 



T{{xi,X2,X3}) W({xi,X2,X3}) 



Ci Cj 




r({xi,xj) = W({x„x,})Vi,i 
Figure 1. Pedigrees based on K^^^ and 
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Case n > 3 

We have to construct hypergraphs that play the role that K^^^ and 
play above. We construct a hypergraph G with edge set {gi; 1 < i < n} 
and a hypergraph H with edge set {hi; 1 < i < n} such that the 
following conditions are satisfied. 

(1) G^H 

(2) For each i;l < i < n, G — gi ^ H — hi; moreover, there is an 
isomorphism between G — gi and H — hi that preserves the edge 
order, that is, vertices in an edge gj in G — gi are mapped to 
vertices in hj under such an isomorphism, for each j ^ i. 

Once such hypergraphs are constructed, we treat each edge in each 
hypergraph as a founder set. We construct pedigrees Tj on founder sets 
gi, and pedigrees Ui on founder sets hi such that 

(1) pedigrees Ti;l < i < n are vertex-disjoint except possibly for 
their founder sets gf, 

(2) pedigrees Ui;l < i < n are vertex- disjoint except possibly for 
their founder sets hi] 

(3) pedigrees Ti and Ui are correspondingly isomorphic; moreover, 
for alH, j; i 7^ J, an isomorphism between G—gj and H—hj that 
preserves the edge order extends to an isomorphism between Tj 
and Ui] 

(4) each of the pedigrees Tj and Ui contains exactly one extant 
vertex Xj. 

The resulting pedigrees U^^^Tj and Uf^iUi are non-isomorphic (since 
the hypergraphs G and H are non-isomorphic) but their sub-pedigrees 
are correspondingly isomorphic. 
Construction of hypergraphs G and H 

The required hypergraphs are constructed by a simple application of 
linear algebra. 

Let each integer in {0,2'^^ — 1} be written in base 2 as an n-digit 
number by padding sufliciently many zeros on the left. We count its 
digits from the right. The set of n-digit binary numbers is denoted by 
[2"]. The i'th digit of a number k is denoted by k{i), and the number 
obtained by setting the i'th digit of /c to (or 1) is denoted by k{i ^ 0) 
(or, respectively, k{i <— 1)). The number of ones and the number of 
zeros in k are denoted by #l(/i:) and #0(A;), respectively. 

The isomorphism class of a hypergraph G with edge set {gi] I < i < 
n} may be represented by a list of integers a{k); k G [2"], where a{k) is 
the number of vertices in nf^ifi, where fi = gi if k{i) = 1, and fi = gi, 
(that is, the complement of gi), if k{i) = 0. In other words, we have to 
only specify the number of vertices in each region of the Venn diagram 
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of Qi]! < i < n. Let the list of integers b{k); k G [2"] similarly denote 
the isomorphism class of H. 

The condition G — Qi = H — hi for 1 < i < n, with an isomorphism 
between them that preserves the edge order, may be expressed as 

(1) a{k{i ^ 0)) + a{k{i ^ 1)) = b{k{t ^ 0)) + b{k{i 4- 1)); k e [2"] 

Since we are interested in non- isomorphic hypergraphs G and H, we 
must find solutions to the above equations so that a{k) ^ b{k) for some 
k e [2"]. 

We verify that 

(2) a{k) = 1, b{k) = when k has even number of I's 
a{k) = 0, b{k) = 1 when k has odd number of I's 

satisfy Equations ((T)). 

It can be easily verified that Ki^^ and U Ki do in fact satisfy 
the above solutions, where we include an isolated vertex in one of the 
graphs purely for algebraic convenience. 

Now on we write [2"] = [2"]e U [2^]o, where [2^]e is the set of integers 
having an even number of I's in their binary representation, and p^jo 
is the set of integers having an odd number of I's in their binary repre- 
sentation. In this notation, the hypergraphs G and H are described as 
follows: the set [2"']e is the set of vertices of G, and a vertex k G [2"]e 
is in Qi if and only if k{i) = 1. Similarly, the set [2"]o is the vertex set 
of H, and a vertex k G [2"']o is in edge hi if and only if k{i) = 1. 

The vertex A; = is in G, but is an isolated vertex, and is included 
at this stage only for algebraic convenience, and may be deleted after 
completing the construction of non-reconstructible pedigrees. 

It is clear that G and H are non-isomorphic, since each of them has 
2"-i vertices, but G has the isolated vertex 0, while H has no isolated 
vertex. What is an isomorphism between G — Qi and H — hi? An edge 
order preserving isomorphism from G — Qi to H — hi must map vertices 
in a region of the Venn diagram of Uj|jyj(7j to the corresponding region 
of the Venn diagram of Uj\j^ihj. Consider any k G [2"]. The vertex 
k{i ^ 0) is in gj for some j 7^ i if and only if the vertex ^ 1) is in 
hj, because the two vertices differ only in their z'th digit. Therefore, if 
<— 0) is in G, then an edge order preserving isomorphism between 
G — Qi and H — hi must map the vertex k{i 0) to the vertex k{i <— 1). 
Similarly, if the vertex k{i ^ 1) is in G, then an edge order preserving 
isomorphism between G — Qi and H — hi must map the vertex k{i ^ 1) 
to the vertex k{i <— 0). Moreover, this isomorphism is unique. On 
the standard hypercube on [2"], each vertex in G — gi is mapped to its 
neighbour along the i'th axis, which is in H — hi. 
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Example 1. Let n = 4, and let the hypergraphs G and H be defined 
on vertex sets [2'^]e and [2^]o as follows: 

c/i = {0011, 0101, 1001, 1111}, g2 = {0011, 0110, 1010, 1111}, 

g-i = {0101, 0110, 1100, nil}, ^4 = {1001, 1010, 1100, nil}, 

= {0001, 0111, 1011, 1101}, h2 = {0010, 0111, 1011, 1110}, 

h = {0100, 0111, 1101, 1110}, /i4 = {1000, 1011, 1101, 1110}, 

where Qi are the edges of G and hi are the edges of H. The isomorphism 
TTi from G — Qi to H — hi that preserves the edge order is given by 
7ri(0000) = 0001, 7ri(0011) = 0010, 7ri(0101) = 0100, 7ri(1001) = 1000, 

TTijoilO) = 0111, TTijlOlO) = 1011, TTijllOO) = 1101, TTifllll) = 1110. 

Observe that vri((yf2) = /i2, '^1(93) = ^3, and TiiigA) = h^ under this 
map. 

Construction of and f/j 

As stated earher, for each z, pedigrees Tj and Ui must be so con- 
structed that (the unique) edge order preserving isomorphism between 
G — Qj and H — hj extends to an isomorphism between Tj and Ui for 
all j ^ i. 

Let a balanced binary tree Tj be defined so that Xi is its root and Qi 
is its set of leaves. By convention, the root Xi is the lowest vertex (at 
depth 0) in Tj, and the leaves are the highest vertices (at depth n — 2) 
in Tj. For a vertex t in Tj, let Tj(t) be the subtree of Tj induced by t 
and all vertices in Tj that are above t. Let to and ti be the parents of 
t. The subtree Ti(t) is a union of subtrees L{t) and R(t), where L(t) 
is induced by vertices t, to, and all vertices above t^, and the subtree 
R{t) is induced by vertices t, ti, and all vertices above ti. We call L(t) 
the left subtree at t, and R{t) the right subtree at t. 

Let ii, 12, ... , in-i be the integers 1 < j < n; j ^ i in arbitrary order. 
The vertices in Qi are grouped in such a way that for each vertex t 
at depth fc; < < n — 3, if a vertex p G (7^ is a leaf of L{t) then 
p{ik+i) = 0, and if a vertex p e ^fj is a leaf of R{t) then p{ik+i) = 1. 

The vertices in hi are partitioned, and a binary tree Ui is constructed 
analogously for the same ordering ij]! < j < n — 1. 

For n = 5 and i = 5 and the ordering ii = 2,12 = 3, ^3 = 1, ^4 = 4, 
the trees T5 and U^ are shown in Figure El 

We show that for every j 7^ i, the unique isomorphism between G—gj 
and H — hj extends to an isomorphism between Tj and Ui. 

Let b = (bi, . . . ,bj);0 < j < n — 2 he a j-tuple of O's and I's. 
Extending a notation introduced earlier, let gi{b) denote the set {k E 
gi\k{ii) = bi, k{i2) = 62, • • • , k{ij) = bj}, which is the set of leaves of a 
binary subtree of Tj rooted at the vertex t(b) at depth j. For example. 



8 



BHALCHANDRA D. THATTE 



95 



11000 10001 10100 11101 10010 11011 11110 10111 




10000 11001 11100 10101 11010 10011 10110 11111 




X5 



Figure 2. Binary pedigrees T5 and U5 ior n — 5 

when b = (0), gi{b) is the set of leaves above the left parent of Xi, 
and t{b) is the left parent of Xi. The set hi{b) and the vertex u(b) are 
analogously defined for Ui. By convention, an empty tuple b defines 
the sets gi and hi, and the trees Tj and Ui, rooted at x^; and a tuple b 
of length n — 2 defines singleton subsets {t{b)} of Qi, and {u(b)} of hi. 
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A tuple b of length n — 2 also uniquely determines the digits t(6)(i„_i) 
and m(6)(z„_i), since we know that is even and is 

odd. Also, if t(b{in-i)) = 1 then u(b{in-i)) = 0, and if = 

then u(b{in-i)) = 1- Therefore, the map t(b) — > u(h) for all tuples of 
length at most n — 2 extends the isomorphism between G — gi„_^ and 
H-hi 

'"n — 1 

Let 6 be a tuple as above. Extending the notation k{i ^ 0) to 6, we 
define h{i ^ 0) to be the tuple obtained by setting fej = in 6, and 
b{i 1) to be the tuple obtained by setting fej = 1 in h. 

Let 6 be a tuple of length n — 2 and j < n — 2. By an argument as 
in the above paragraph, we have 

(1) if t{b{j ^ 0)iin-i)) = 1 then u{bij ^ l)(^n^i)) = 1; 

(2) if t{b{j ^ 0)(^„_i)) = then u{b{j ^ l){tn-i)) = 0; 

(3) if t{bij ^ mn-i)) = 1 then uibU ^ 0)(^„_i)) = 1; 

(4) if t{bij <- l)(^„_i)) = then uibU ^ 0)(^„_i)) = 0. 

Therefore, for each j',l<j<n — 2, the map defined by 

(1) t{b) — > u{b) for all tuples of length at most j — 1; 

(2) t{b{i 4— 0)) — > u(b{j ^ 1)) for all tuples of length at least j; 
and 

(3) t(b{j <— 1)) — > u(b{j 0)) for all tuples of length at least j 
extends the isomorphism between G — gj and H — hj. Observe that 
this map sends the vertices in the left subtree L(t{b)) in Tj to the 
vertices in the right subtree R{u{b)) in f/j, and the vertices in the right 
subtree R(t(b)) in Tj to the vertices in the left subtree L{u(b)) in 
for each tuple b of length j. Since the trees Tj and Tj (and trees f/j and 
Uj) are disjoint except for their founders for all i ^ j, the isomorphism 
between G—Qj and H—hj extends to an isomorphism between pedigrees 
T(Xo\{xj}) and W(Xo\{xj}) for all j. 

An isomorphism between G — Qi^ and H — hi^ that extends to an 
isomorphism between Tj and Ui] i i2 is schematically shown in Fig- 
ure El □ 



Remark 1 . The pedigrees constructed above do not admit a valid gen- 
der labelling. That is, we cannot assign labels m (male) and / (female) 
to all vertices so that each vertex (except founders) has one male par- 
ent and one female parent. For example, in the n = 3 case, is not a 
bipartite graph, so a valid gender labelling is impossible. But the exam- 
ples can be easily modified to create non-reconstructible pedigrees that 
also admit valid gender labels. Each vertex in a pedigree constructed 
above may be duplicated, and one vertex may be treated male and the 
other female, as shown in Figure lU At the bottom of the tree Tj (or 
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{kegMibi) = hk{b2)=0} 




Figure 3. Isomorphism between G — and H — hi^ 
extends to an isomorphism between and Ui 

U,i), the vertex Xi is duphcated as vertices and x(, and the new 
vertex Xi IS cL child of X™ and x[. 

Remark 2. Let k, k' G [2"] be any two adjacent vertices on the hyper- 
cube. From Equation (jT)), if a{k) — h{k) = p > for some p, then 



COMBINATORICS OF PEDIGREES 



11 




Figure 4. Construction of a pedigree with a valid gen- 
der labelling 

b{k') — a{k') = p, regardless of which digit k and k' differ at. In fact, 
by connectivity of the hypercube, we have a(r) — b{r) = p for all ver- 
tices r G [2"] that are at even distance from k on the hypercube, and 
b{r) — a{r) = p for all vertices r G [2"] that are at odd distance from k 
on the hypercube. This further implies that the hypergraphs G and H 
constructed in the above counter example have a special structure: for 
each < i < n, \gi\ = \hi\ > 2""^. Let G{d) be the hypergraph with 
edge set {gi{d)] I < i < n}, where gi{d) is the set of grandparents of Xj 
at depth d in the pedigree T{Xq), and let H{d) and hi{d) be similarly 
defined for the pedigree U{Xq), then the hypergraphs G{d) and H{d) 
must be isomorphic whenever d < n — 2. 

We end this subsection with a conjecture motivated by the observa- 
tions made in Remark |21 

Conjecture 1. The counter example constructed in Theorem^is min- 
imal. In other words, if a pedigree T{Xq) of order n is not (n — 1)- 
reconstructible then it has depth at least n — 2, and there are at least 
2""-^ ancestors at depth n — 2. 
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Remark 3. Let G and H be simple graphs with edge sets E{G) = 
{gfA ^ 'i ^ ^} and E{H) = {hf,! < i < m}, respectively, such that 
G — Qi = H — hi for all 1 < 2 < m. Then the edge reconstruction 
conjecture states that G = H provided m > 3. The condition m > 3 is 
required since Ki^^ and - the graphs used as the base case of our con- 
struction of non-reconstructible pedigrees - are not edge reconstructible. 
Although no counter examples are yet known, Nash- Williams proved 
a characterisation of (hypothetical) counter examples to edge recon- 
struction. His characterisation was based on a generalisation of ideas 
earlier introduced by Lovasz jT]. Without going into details, we note 
that the counter examples presented here have certain similarities with 
the characterisation by Nash- Williams. It may be possible to exploit 
such similarities to prove Conjecture ^ 

2.2. A positive result. Let T{Xq) be a discrete generation pedigree 
on Xo of order n > 2. Let Sn-i{T) = {T(Y)\Y C Xq, \Y\ = n - 1}. 
Consider the edge labelled (multi) graph Gi whose vertex set is Xi 
(that is, the vertices at depth 1), and vertices x,y E Xi are joined by 
an edge e, if they are the parents of Xt. 

Lemma 1. // there are vertices Xi and Xj in Xq that have the same 
parents, then T{Xq) is uniquely determined by S'.„_i(T). 

Proof. The situation in the lemma is recognised by looking at T{XQ\xk), 
where Xk ^ {xi,Xj}. Now T{Xq) is uniquely obtained from T(Xo\xj) 
by joining Xi to the parents of Xj. □ 

Lemma 2. Ifn>3 and if Gi contains a cycle then T[Xq) is uniquely 
determined Sn~i{T). 

Proof. Let Cj be an edge in a cycle in Gi. The end vertices of are the 
two parents of Xj. Since the set of half brothers of Xi is known from the 
collection S'n-i, the parents of uniquely recognised in T{X\xi). 

Note that we need the condition n > 3 because otherwise we would get 
a counter example based on Gi = or G\ = Ki^^. 

□ 

Corollary 1. // |Xi| < n and n > 3 then S'„_i(T) determines T{Xq) 
up to congruence. 

Proof. If no two vertices in Xq have the same two parents then Gi has 
n simple edges (that is no two edges are parallel edges), and there is a 
cycle in Gi. □ 

We end this section with another conjecture. 
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Conjecture 2. Discrete generation pedigrees of order n that have a 
constant population in each generation are r -reconstructihle for r > 
\ogn. 

This conjecture is true if Conjecture [T] is true. For suppose that 
Conjecture [T] is true but Conjecture |21 is not true, and that there is 
a pedigree of order n that is not r-reconstructible for some r > \ogn. 
Therefore, for some r > logn, there is a sub-pedigree of order r + 1 that 
is not r-reconstructible. Such a sub-pedigree must have depth at least 
r — 1, and must have at least 2^ vertices at depth r — 1, implying that 
r < logn. Thus if r > logn then we have a contradiction, therefore, 
all sub-pedigrees of order r + 1 are r-reconstructible when r > logn, 
and we can complete the reconstruction inductively. 

3. Enumeration of pedigrees 

Let N{n, d) be the number of distinct (mutually non-isomorphic) 
discrete generation pedigrees of depth d with n vertices in each gener- 
ation. As before, the extant vertices are assumed to be labelled, and 
other vertices are assumed to be unlabelled. 

In a general pedigree, the depth of a vertex u is the largest integer k 
for which m is a fc'th grandparent of an extant vertex. The depth of a 
pedigree is the largest integer d for which there is a vertex of depth d 
in the pedigree. Let the number of distinct general pedigrees of depth 
d with constant number n of vertices at each depth be M(n, d). 

The purpose of this section is to derive lower and upper bounds 
on N{n,d) and M{n,d). The bounds are relevant to an information 
theoretic argument that was used by Steel and Hein in the context of 
a reconstruction question. 



Theorem 2. 

(3) ( ''^ T ) < N{n,d)< 



n — I n \ I 



(4) '^"^ \{{{nl2){d-l-k)T< M{n,d)< r\ 

k=0 



Proof Let V{Xq) be a discrete generation pedigree of depth d on Xq. 
Let Xi be the set of vertices at depth i. Let \Xi\ = n for aA\ i;0 < i < d. 
For each r,l < i < d, define a graph Gi as follows: the vertex set of Gi 
is Xi, and {u, v} is an edge in Gi if u and v have a child in Xj_i. Thus 
1 < e{Gi) < n for 1 < i < d, where e{G) denotes the number of edges 
of a graph G. We restrict ourselves to bipartite graphs Gi so that it is 
possible to assign valid gender labels to the vertices of pedigrees. 
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Let S{n, k) denote the Sterling number of the second kind. There 
are S{n, k) partitions of Xq in groups of sibhngs, where siblings are 
vertices that share both parents. If the vertices of Gi are labelled then 
there are k\ ways of assigning the groups of siblings to pairs of parents. 
Therefore, each labelled graph Gi gives S{n, k)k\ labelled pedigrees of 
depth 1. Some of those pedigrees may be isomorphic to each other 
since there may be automorphisms of Gi that permute the edges of Gi 
non-trivially. Therefore, the number of distinct pedigrees of depth 1 
that can be obtained from a labelled graph Gi is given by 

AT/ 1 ^ \ S(n, k)k\ 

where LGi denotes the line graph of Gi, and autC denotes the auto- 
morphism group of a graph G. If every non-trivial automorphism of 
Gi permutes the edges of Gi non-trivially then the number of distinct 
pedigrees of depth 1 that can be obtained from Gi is given by 

N{n,d,Gi) ^ ' 



lautGil 



Each non-trivial automorphism of a graph G permutes the edges of G 
non-trivially if and only if G has no isolated edges and not more than 
one isolated vertices. Therefore, 



G 



lautGI 



where the summation is over all distinct bipartite graphs G having n 
vertices, at least 1 and at most n edges, at most one isolated vertex, 
and no isolated edges. 

Pedigrees of depth 1 considered above have the additional property 
that they have no non-trivial automorphisms that fix each vertex in Xq, 
implying that the vertices of X\ are distinguishable in such pedigrees. 
Therefore, 

S{n,e{G))e{G)\^ 



G 



lautGl 



where the summation is over all graphs of the type described above. 
Summing over only graphs that have n — 1 edges, we have 

S{n,e{G))e{G)\ _ G)(n-1)! 
lautGl lautGl 
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But n!/|autG| is the number of labelled graphs isomorphic to G. There- 
fore, summing over trees, we get 

I III 11/ 

N{n, d) > 



The upper bound on N(n, d) is obtained by counting fully labelled 
pedigrees that do not even possibly admit a valid gender labelling. 

We derive a lower bound on M{n, d) by enumerating a special sub- 
class of general pedigrees that is described next. Consider pedigrees of 
depth d and order n that satisfy the conditions: 

(1) there are n vertices at each depth k < d, 

(2) each vertex at depth k < d — 2 has exactly one parent at depth 

k+1, 

(3) distinct vertices at depth k < d — 2 have distinct parents at 
depth A; -I- 1, 

(4) at each depth k;k < d—1, there are n/2 vertices of each gender, 

(5) the pedigree of depth 1 induced by vertices in X^^i U has 
no non-trivial automorphisms that fix vertices in X^^i. 

The conditions imply that given any vertex v at depth k;k < d — 
1 there is a unique path of length k beginning at some vertex u in 
Xq and ending at v. Therefore, vertices at depth at most d — 1 are 
distinguishable. The last condition above makes the vertices at depth 
d distinguishable as well. Therefore, no two pedigrees described by 
the above conditions arc isomorphic. This allows us to derive a lower 
bound on M(n, d). 



M{n, d) > t^I}^ ll{{n/2){d - 1 - k)r, 

where the first factor is a lower bound on the number of distinct pedi- 
grees of depth 1 that are induced by X^-i U X^, vertices in X^-i being 
labelled. For a vertex at depth k < d — 2, the parent that is not at 
depth A; + 1 may be chosen from the {n/2){d — 1 — k) distinguishable 
vertices at depth A; -|- 2 or more. This explains the second factor. 

An upper bound on M{n, d) is obtained by counting the number of 
labelled directed graphs in which each vertex has out-degree 2. □ 

Remark 4. Steel and Hein give the information theoretic argument that 
if there are s segregating sites in DNA sequences obtained from n ex- 
tant individuals, then there are 4"* possible combinations of sequences. 
Therefore, 4"^ must be at least N{n,d) (or M{n,d)) depending on 
what assumptions are made about pedigrees) to be able to reconstruct 
their pedigree up to depth d. They derive a lower bound on s given by 



-2 d.-2 
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(c//3) log n for reconstruction of discrete generation constant population 
size pedigrees. They comment that in reality the number of sites re- 
quired is likely to be much higher due to under-counting of isomorphism 
classes and due to the stochastic nature of sequence evolution. Theo- 
rem 121 gives an information theoretic lower bound on s that is about 
{d/2)\ogn for discrete generation constant population size pedigrees, 
and a bound of about {d / 2) \og{nd) for general pedigrees. Moreover, 
the bounds based on the upper bounds on N{n, d) and M(n, d) are 
only about d\ogn and d\og{nd), respectively, for discrete generation 
and general pedigrees. 

Remark 5. If we assume that no vertex at depth k has a parent at 
depth more than k + t + 1 then we have 



This gives a lower bound of about {d/2) \og{nt) on the number of seg- 
regating sites required for pedigree reconstruction. 
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